21 research outputs found
Hybrid N-gram Probability Estimation in Morphologically Rich Languages
PACLIC 23 / City University of Hong Kong / 3-5 December 200
Specifications and Analysis of the Korean Sentiment Analysis Corpus
This paper describes the two year endeavor of constructing the Korean Sentiment Analysis Corpus (KOSAC), focusing on the theoretical background and the analysis of the corpus itself. Our aim is to provide a solid theoretical background for the corpus which reflects the characteristics of the Korean language and includes approximately 7,744 sentences taken from news articles. The corpus annotation scheme, based on the MPQA, is described along with the statistics of features specified in the corpus. The analysis of the corpus can be a starting point for how to utilize the corpus not only for sentiment analysis but also for semantic or pragmatic work in terms of speaker’s attitude and emotional expressions